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ABSTRACT 

Background: The Permanent Resident Database of Citizenship and Immigration Canada (CIC) contains socio- 
demographic information on immigrants but lacks ethnic group classifications. To enhance its usability for ethnicity- 
related research, we categorized immigrants in the CIC database into one of Canada's official visible minority groups or 
a white category using their country of birth and mother tongue. 

Methods: Using public data sources, we classified each of 267 country names and 245 mother tongues in the CIC data 
into 1 of 10 visible minority groups (South Asian, Chinese, black, Latin American, Filipino, West Asian, Arab, Southeast 
Asian, Korean, and Japanese) or a white group. We then used country of birth alone (method A) or country of birth plus 
mother tongue (method B) to classify 2.5 million people in the CIC database who immigrated to Ontario between 1985 
and 2010 and who had a valid encrypted health card number. We validated the ethnic categorizations using linked self- 
reported ethnicity data for 6499 people who responded to the Canadian Community Health Survey (CCHS). 

Results: Among immigrants listed in the CIC database, the 4 most frequent visible minority groups as classified by 
method B were South Asian (n = 582 812), Chinese (n = 400 771), black (n = 254 189), and Latin American (n = 179 118). 
Methods A and B agreed in 94% of the categorizations (kappa coefficient 0.94, 95% confidence interval [CI] 0.93-0.94). 
Both methods A and B agreed with self-reported CCHS ethnicity in 86% of all categorizations (for both comparisons, 
kappa coefficient 0.83, 95% CI 0.82-0.84). Both methods A and B had high sensitivity and specificity for most visible mi- 
nority groups when validated using self-reported ethnicity from the CCHS (e.g., with method B, sensitivity and specificity 
were, respectively, 0.85 and 0.97 for South Asians, 0.93 and 0.99 for Chinese, and 0.90 and 0.97 for blacks). 
Interpretation: The use of country of birth and mother tongue is a validated and practical method for classifying im- 
migrants to Canada into ethnic categories. 
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>^ As ONE OF THE MOST ETHNICALLY DIVERSE COUNTRIES / 

Canada is home to individuals of over 200 ethnic ori- 
gins.^ Canada's growing diversity is due primarily to high 
levels of immigration. Since the 1990s, about 250000 



immigrants have arrived annually.^ The major sources 
of Canada's immigrants are Asia, Europe, the Caribbean, 
South and Central America, Africa, and the United States.^ 
In Ontario, Canada's largest province, the 2006 Census 
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identified that 23% of the population belonged to an eth- 
nic minority group, with the largest groups being South 
Asian, Chinese, and black.^ From 2007 to 2011, 42% of all 
Canadian immigrants landed in Ontario.^ 

The increasingly multi-ethnic nature of society in 
Canada and other countries is fuelling a need for eth- 
nicity data to permit better understanding of these 
diverse populations. For example, in health research, 
ethnicity classifications can be used to better under- 
stand the etiology of disease, the respective roles of 
environment and genetics in health and disease, and 
the health status of disadvantaged groups, as well as to 
improve health care delivery and target specific public 
health interventions toward high-risk populations."*'^ 
However, such classifications may also have associ- 
ated weaknesses, such as contributing to racialized 
identities, a social concept denoting power inequality 
between ethnic or racial groups, which has been sug- 
gested to have negative health implications.'' 

The concept of ethnicity is complex and its definition 
challenging.^'^'® The concepts of ethnicity and race are 
sometimes used synonymously, although they do not 
overlap completely^: 

• Ethnicity has been defined as "the social group a 
person belongs to, and either identifies with or is 
identified with by others, as a result of a mix of 
cultural and other factors including language, 
diet, religion, ancestry, and physical features trad- 
itionally associated with race."^ 

• Race has been defined "by historical and common 
usage," as "the group (sub-species in traditional 
scientific use) a person belongs to as a result of a 
mix of physical features such as skin colour and 
hair texture, which reflect ancestry and geograph- 
ical origins, as identified by others or, increasing- 
ly, as self-identified."^ 

Although ethnicity has long been recognized as an 
important covariate in health research, individual- 
level ethnicity data are rarely collected in Canadian 
health care data sets. Similarly, although some ethni- 
city data are captured in Canada's census, these data 
are restricted to Statistics Canada and therefore cannot 
be linked to many other administrative data sets avail- 
able in Canada's provinces and territories. In an effort 
to address this gap, alternative ethnicity classifications 
have been used. 

The various methods used to define and assign eth- 
nicity include surname-based approaches, geocoding of 
residential address, and classification based on country 



of birth, language, or a combination of these.^"^^ Coun- 
try of birth in particular has been widely collected in 
many administrative and government data sets and 
represents an objective and potentially valuable source 
of ethnicity information. 

Given Canada's high immigration rate, the Perma- 
nent Resident Database of Citizenship and Immigration 
Canada (CIC) may be a useful source of ethnicity data 
for health research. In the past decade, this database 
has been used for socio-economic and health studies 
of immigrants in various Canadian provinces.^"*"^" The 
CIC data provide detailed prelanding demographic 
and socio-economic information, including country of 
birth, for all Canadian immigrants. However, this data 
set lacks self-reported ethnicity, and the large number 
of options for country of birth and mother tongue with- 
in this data set (over 200 options for each variable) can 
also make it challenging to use for such purposes. In an 
attempt to improve the practical use of this database for 
ethnicity-related research projects, we describe here a 
method for classifying Ontario CIC data records into 
Canada's 10 official visible minority ethnic groups (plus 
a white group) using either country of birth or country 
of birth plus mother tongue variables. We also report 
validation of this method using information from Sta- 
tistics Canada's Canadian Community Health Survey 
(CCHS), a large population-based telephone survey of 
the Canadian population which includes self-reported 
ethnicity information, the current "gold standard" for 
ethnicity classification. 

Methods 

CIC Permanent Resident Database. The CIC Perma- 
nent Resident Database provides detailed socio-demo- 
graphic information for all legal immigrants to Canada, 
including country of birth, citizenship, country of last 
permanent residence, and mother tongue. For this an- 
alysis, we used the CIC data set held at the Institute for 
Clinical Evaluative Sciences, which pertains to Ontario 
immigrants who arrived between 1985 and 2010. This 
data set includes 267 options for country of birth and 
245 options for mother tongue. The Ontario CIC data- 
base has been used as a source of ethnicity data for pre- 
vious health research studies. ^"^"^^ 

Study population. The CIC data set used for this study 
contains records for 2.9 million immigrants who 
landed in Ontario over the period from 1985 to 2010. 
We excluded about 400000 records because of an in- 
ability to identify a valid health card number in the 
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Ontario Registered Persons Database; the health card 
number was required for record linkage to the self-re- 
ported ethnicity data that we used for validation pur- 
poses. The reasons for absence of a valid health card 
number are multifactorial and include immigrants' 
departure from the province shortly after arrival (i.e., 
before registering for a health card number), as well as 
typographic inconsistencies in the CIC database or the 
Registered Persons Database (or both). Landed immi- 
grants become eligible for health care benefits after a 
3-month waiting period. Records for the remaining 2.5 
million Ontario immigrants could be linked to other 
administrative databases available at the Institute for 
Clinical Evaluative Sciences. All data were de-identified 
and health card numbers were encrypted to protect 
privacy. 

Classification of ethnic groups. The CIC database lacks 
ethnic or visible minority group classifications. To fa- 
cilitate use of the CIC data for health research, we 
tested 2 methods for classifying the immigrants in this 
data set into 11 ethnic categories, specifically the 10 of- 
ficial visible minority groups used by Statistics Canada 
(South Asian, Chinese, black, Latin American, Fili- 
pino, West Asian, Arab, Southeast Asian, Korean, and 
Japanese) and a white category. According to the Em- 
ployment Equity Act, visible minorities are defined as 
"persons, other than aboriginal peoples, who are non- 
Caucasian in race or non-white in colour."^^ For this 
study, we first tested country of birth alone (method A) 
and then country of birth plus mother tongue (method 
B) to classify the Ontario CIC data set. 

Method A (country of birth). We mapped each of the 
267 country-of-birth names in the Ontario CIC data 
set (including previous county-of-birth names changed 
for political reasons) to 1 of 12 categories: the 10 vis- 
ible minority groups specified by Statistics Canada, a 
white category, and an "excluded" category. We used 
a combination of publicly available resources for this 
purpose, including Statistics Canada's ethnic origin 
categories for the 2006 Census of Population (our pre- 
ferred source),^^ the United Nations Standard Coun- 
try or Area Codes for Statistical Use (also known as 
the M49 list),^^ the World Bank list of economies (as of 
July 2012),^"* and The World Factbook of the US Cen- 
tral Intelligence Agency. These resources consider the 
ethnic mix of countries and provide additional infor- 
mation needed to appropriately assign each country to 
its predominant ethnic group. 



The 10 visible minority categories used by Statistics 
Canada are heterogeneous. Whereas some categories 
are associated with a single country, and classification 
is straightforward (e.g., the country of Japan was as- 
signed to the Japanese category), other categories, such 
as South Asian and Latin American, relate to geograph- 
ic regions and include multiple countries. For example, 
we assigned the countries in South America and most 
of those in Central America to the Latin American cat- 
egory. In contrast, categories such as black and Arab 
may be considered primarily ethnocultural classifica- 
tions associated with overlapping geopolitical bound- 
aries (see methodological details in online Appendix A). 

The white category was used for European countries 
and those with populations of predominantly European 
origin (e.g., Australia). The "excluded" category was cre- 
ated for immigrants whose countries of birth were not 
accounted for by the 10 major visible minority groups 
defined by Statistics Canada or the white category as 
defined above and those whose CIC data were irregular 
(e.g., "country not stated" or "British Overseas Citizen"). 
Further details are provided in online Appendix A. 

Method B (country of birth plus mother tongue). In an 

effort to further refine the classification based on coun- 
try of birth, we then completed a second classification 
based on country of birth plus mother tongue. The eth- 
nic makeup of many countries is heterogeneous, and 
there may be individuals whose ethnic background 
differs from the predominant ethnocultural group (or 
groups) of their country of birth. For instance, a per- 
son may be born to South Asian parents in a country 
with a predominantly white population (e.g., the United 
Kingdom). In such cases, a person's mother tongue may 
be more representative of his or her ethnic background 
than his or her country of birth. 

We mapped each of the 245 mother tongues in the 
Ontario CIC data set to 1 of 15 categories: the 10 Statis- 
tics Canada visible minority groups, a white category, 
and 4 additional language categories ("world language," 
"other," "excluded," and "unknown"). Publicly available 
data sources, such as Ethnologue: Languages of the 
World'^^ and The World Factbook,^^ were used to gather 
language information and assign each mother tongue 
to an ethnic group. For instance, Cantonese and Man- 
darin were categorized as Chinese, and Persian and 
Kurdish were categorized as West Asian. 

The "world language" category, created to account for 
languages spoken in multiple categories and by various 
visible minority groups, comprised English, French, 
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Spanish, Portuguese, and Russian. Less specific lan- 
guage options were assigned to the closest category (e.g., 
"other European languages" to white) or to the "other" 
category (e.g., Hebrew) (see methodological details in 
online Appendix A). The "excluded" language category 
was created for 3 languages (Busan, Uzbek, Samoan) 
associated with the countries in the "excluded" cat- 
egory defined in method A (as described in the previ- 
ous subsection). The "unknown" category was created 
for languages for which a region of origin or single eth- 
nic group could not be identified. The number of immi- 
grants speaking languages categorized as "excluded" or 
"unknown" was relatively small (< 0.1% of total sample). 

We developed an algorithm to determine a final cat- 
egory for each individual immigrant record using both 
methods: country of birth (method A) and country of 
birth plus mother tongue (method B) (see the flow chart 
in online Appendix B). 

Validation of classification accuracy. For validation, we 
compared the ethnic group assigned by each of our 2 
methods with self-reported ethnic group data in Statis- 
tics Canada's CCHS, a population-based cross-sectional 
health telephone survey of Canadians Table 1 
aged 12 years and older. More specif- 
ically, we used respondents' answers 



Geographic visualization. We used ArcGIS Desktop 
software version 10 (ESRI, Redlands, California) to 
create a map showing the global distribution of major 
ethnic groups associated with the countries of birth of 
Ontario immigrants, as recorded in the CIC database. 
A data set for world country boundaries was obtained 
from the website thematicmapping.org. 

Ethics approval. This project received ethics approval 
through the Research Ethics Board of Sunnybrook 
Health Sciences Centre. 

Results 

The study sample from the Ontario CIC data set con- 
sisted of 2500514 immigrants with mean age ± stan- 
dard deviation (SD) of 30 ±17 years at the time of 
landing, of whom 51% were female. The top 3 countries 
of origin were India, China, and the Philippines, and 
the top 3 mother tongues were English, Mandarin, and 
Cantonese (Table 1). 

Figure 1 displays the world distribution of the ethnic 
groups assigned to countries of birth in our sample. A 
list of all countries and mother tongues in the Ontario 



Top 20 countries of birth and mother tongues of immigrants recorded 
In the Citizenship and Immigration Canada (CIC) Permanent Resident Database 
who landed In Ontario from 1985 to 2010 



■ ■ ' ^--^ --1^" -J ^ ^ . 

in Canada come from many different 
cultural and racial backgrounds. Are 


Rank 


Top 20 countries of birth 


Top 20 mother tongues 


; Country of birth* 


No. of immigrants 


Mother tongue* 


No. of immigrants 


you [white. South Asian, etc.]?" We 


1 


India 


296 805 


English 


365 194 


used encrypted health card numbers 


2 


China, People's Republic of 


263 450 


IVlandarin 


170317 


to link data from 4 cycles of the CCHS 


3 


Philippines 


163 223 


Cantonese 


166 533 


(2000/2001 to 2007/2008) with the 


4 


Pakistan 


134 967 


Tagalog 


143 603 


CIC data set for Ontario. 


5 


Sri Lanka 


96 110 


Arabic 


135 219 


We calculated percent agreement 


6 


Hong Kong 


94 038 


Punjabi 


134 238 


and simple kappa statistics to com- 


7 


Poland 


78 368 


Urdu 


129 566 


pare classification by methods A and 


8 


Iran 


74 957 


Spanish 


123 156 


B with the CCHS self-reported ethnic 


9 


Jamaica 


72 782 


Tamil 


96 376 


classification (the reference standard). 


10 


United States of America 


60 155 


Russian 


81 746 


Overall percent agreement was defined 


11 


United Kingdom and Colonies 


58 180 


Polish 


78 601 


as the number of similar ratings by the 


12 


Guyana 


50 643 


Gujarati 


58 070 


2 methods divided by the total num- 


13 


Korea, Republic of 


41 005 


Chinese 


48 702 


ber of ratings. Sensitivity, specificity. 


14 


Vietnam, Socialist Republic of 


39 666 


Portuguese 


48 365 


and positive and negative predictive 


15 


Romania 


38 223 


Hindi 


45 879 


values were calculated for each visible 


16 


Trinidad and Tobago, Republic of 


34 819 


Farsi 


43 637 


minority category for comparisons of 


17 


Yugoslavia 


34 788 


Korean 


41 489 


methods A and B with the CCHS clas- 


18 


Russia 


34 523 


Romanian 


37 099 


sification. We used SAS version 9.2 


19 


Iraq 


33 648 


Bengali 


37 024 


(SAS Institute Inc., Cary, North Caro- 


20 


Bangladesh 


32 331 


Persian 


33 692 


lina) for all statistical analyses. 


*The country and language labels are as per the CIC data formats and therefore may refer to old names 



in some cases. 
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CIC data with the assigned ethnic categories 
is available by contacting the corresponding 
author. 

The 2 methods used to classify immi- 
grants (i.e., on the basis of country of birth 
alone or on the basis of country of birth 
plus mother tongue) resulted in some dif- 
ferences in categorization (Table 2). For 
instance, among 523 855 immigrants clas- 
sified as white by country of birth, 8271 and 
2502 individuals were classified as South 
Asian and Chinese, respectively, by coun- 
try of birth plus mother tongue. Methods A 
and B showed agreement for 94% of the rat- 
ings (kappa coefficient 0.94, 95% confidence 
interval [CI] 0.93-0.94). 

From the Ontario CCHS data set (n = 
134 567), we linked 6585 records to the CIC 
data set. Of these, 86 individuals with mul- 
tiple ethnicities were excluded, leaving 6499 
for the validation analysis. For these 6499 
CCHS respondents, the mean age ± SD was 
29 ± 15 years at the time of landing, and 
52% were female. For the vast majority of 
the respondents, self-reported ethnicity in 
the CCHS data matched the ethnic group as- 
signed by our method B (Table 3) or method 
A (see online Appendix C). 

Ethnic categorization by either method A 
(country of birth alone) or method B (coun- 
try of birth plus mother tongue) agreed with 
the self-reported CCHS ethnic group for 
86% of respondents (kappa coefficient 0.83, 
95% CI 0.82-0.84, for both comparisons). 

When the classification accuracy of 
method B was compared with self-reported 
ethnicity from the CCHS, consistently high 
specificity and negative predictive values 
were found for all groups (Table 4). Sensi- 
tivity for the Southeast Asian category and 
positive predictive values for the Latin 
American, Southeast Asian, and West Asian 
categories were relatively lower. For the 
majority of classification indices, method 
B (country of birth plus mother tongue) 
showed a slight improvement in categoriza- 
tion over method A (country of birth alone) 
or no change (see online Appendix D for 
validation results for method A). 
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Table 3 

Frequency of immigrants to Ontario in each self-reported ethnic category (based on the Canadian Community Health Survey 
[CCHS]) and as classified using country of birth plus mother tongue (method B), based on data In the Citizenship 
and Immigration Canada Permanent Resident Database^t 



Method B (using country of birth plus mother tongue) 



Sslf-rcportsd 
(CCHS) 


Excluded 


White 


South 
Asian 


Chinese 


Black 


Latin 
American 


Filipino 


West 
Asian 


Arab 


Southeast 
Asian 


Korean 


Japanese 


Total 


White 


40 


2021 


<10 


<10 


19 


91 


<10 


74 


40 


<10 


<10 


<10 


2300 


South Asian 


26 


<10 


1105 


<10 


60 


66 


<10 


13 


<10 


<10 


<10 


<10 


1296 


Chinese 


<10 


<10 


< 10 


720 


<10 


<10 


12 


<10 


<10 


22 


<10 


<10 


769 


Blacl< 


<10 


12 


< 10 


<10 


586 


26 


<10 


<10 


15 


<10 


<10 


<10 


645 


Latin American 


<10 


<10 


< 10 


<10 


11 


364 


<10 


<10 


<10 


<10 


<10 


<10 


389 


Filipino 


<10 


<10 


< 10 


<10 


<10 


<10 


321 


<10 


<10 


27 


<10 


<10 


355 


West Asian 


<10 


<10 


10 


<10 


16 


11 


<10 


155 


16 


<10 


<10 


<10 


224 


Arab 


<10 


<10 


<10 


<10 


<10 


<10 


<10 


12 


185 


<10 


<10 


<10 


212 


Southeast Asian 


<10 


<10 


96 


12 


<10 


<10 


12 


<10 


<10 


61 


<10 


<10 


205 


Korean 


<10 


<10 


<10 


<10 


<10 


<10 


<10 


<10 


<10 


<10 


80 


<10 


83 


Japanese 


<10 


<10 


<10 


<10 


<10 


<10 


<10 


<10 


<10 


<10 


<10 


19 


21 


Total 


96 


2059 


1227 


735 


715 


562 


354 


259 


262 


128 


82 


20 


6499 



* Specific data for cells with value < 10, including cells with value 0, were suppressed to protect privacy. Row and column totals are the true sums, including the suppressed 
values. 

t Values in cells along the diagonal are shown in bold to highlight similar classification by the 2 methods. 



Table 4 



Validation of ethnic classification using country of birth plus mother tongue (method B), with self-reported ethnicity 
(from Canadian Community Health Survey) as reference (n = 6499) 


Classification 
by method B 


Sensitivity (95% CI) 


Specificity (95% CI) 


Positive predictive value 
(95% CI) 


Negative predictive value 
(95% CI) 


White 


0.87 


(0.86-0.89) 


0.99 


(0.98-0.99) 


0.98 


(0.97-0.98) 


0.93 


(0.92-0.94) 


South Asian 


0.85 


(0.83-0.87) 


0.97 


(0.97-0.98) 


0.90 


(0.88-0.91) 


0.96 


(0.95-0.96) 


Chinese 


0.93 


(0.91-0.95) 


0.99 


(0.99-0.99) 


0.97 


(0.96-0.98) 


0.99 


(0.98-0.99) 


Black 


0.90 


(0.88-0.92) 


0.97 


(0.97-0.98) 


0.81 


(0.78-0.84) 


0.98 


(0.98-0.99) 


Latin American 


0.93 


(0.90-0.95) 


0.96 


(0.96-0.97) 


0.64 


(0.60-0.68) 


0.99 


(0.99-0.99) 


Filipino 


0.90 


(0.86-0.93) 


0.99 


(0.99-0.99) 


0.90 


(0.87-0.93) 


0.99 


(0.99-0.99) 


West Asian 


0.69 


(0.62-0.75) 


0.98 


(0.98-0.98) 


0.59 


(0.53-0.65) 


0.98 


(0.98-0.99) 


Arab 


0.87 


(0.82-0.91) 


0.98 


(0.98-0.99) 


0.70 


(0.64-0.76) 


0.99 


(0.99-0.99) 


Southeast Asian 


0.29 


(0.23-0.36) 


0.98 


(0.98-0.99) 


0.47 


(0.38-0.56) 


0.97 


(0.97-0.98) 


Korean 


0.96 


(0.89-0.99) 


0.99 


(0.99-1 .00) 


0.97 


(0.91-0.99) 


0.99 


(0.99-0.99) 


Japanese 


0.90 


(0.69-0.98) 


0.99 


(0.99-1 .00) 


0.95 


(0.75-0.99) 


0.99 


(0.99-1.00) 


CI = confidence interval. 



Interpretation 

We used 2 methods (country of birth alone or country 
of birth plus mother tongue) to classify Ontario im- 
migrants in the CIC data set into ii predefined ethnic 
groups. We found a high degree of agreement between 
self-reported ethnic groups from CCHS data and those 
assigned by our 2 classification methods. Compared with 
country-specific or world region-specific classifications 
used previously/"* our classification by visible minority 



groups may be more practical for researchers and health 
policy-makers, as it is comparable to other important 
population statistics on visible minorities produced by 
Statistics Canada and other international organizations. 
Our methods may also prove useful (with local custom- 
ization) in other countries where health-related informa- 
tion regarding self-reported ethnicity is not available or 
is not routinely collected but data on immigrants' coun- 
try of birth and/or mother tongue are available. 



Open Medicine 2013;7(4)e91 



Research 



Rezai et al. 



Using country of birth to define ethnicity has been 
reported as a robust method for heahh care research in 
countries such as the Netherlands, where this variable 
was closely correlated with self-reported ethnicity/^'^^ 
Nevertheless, this method has been criticized,^^'^^ be- 
cause the definition of ethnicity is complex and may 
not always be determined by geography. Problems can 
arise with multi-ethnic countries (e.g., Australia, the 
United States, South Africa) or with individuals born 
to a family whose ethnicity is different from the pre- 
dominant ethnic group of their country of residence. 
To further investigate this issue, we analyzed the CCHS 
self-reported ethnicity of a subset of the immigrants 
in the CIC data who had a CCHS-linked record and 
came from a large, multi-ethnic country (i.e.. United 
Kingdom, United States, South Africa, or Australia, as 
defined by country of birth in CIC data). Among these 
CIC-CCHS linked records, 292 (94%) of the 310 im- 
migrants born in the United Kingdom self-identified 
as white, as did 179 (87%) of the 206 immigrants born 
in the United States, 56 (89%) of the 63 born in South 
Africa, and 24 (96%) of the 25 born in Australia. These 
data support the validity of our ethnicity classification 
algorithm for immigrants from these countries. Clas- 
sification methods that use additional information such 
as language and parents' country of birth have been 
shown to improve classification accuracy over methods 
based on country of birth alone.^^ In our study, add- 
ing mother tongue to country of birth resulted in only 
slight improvements in ethnicity classification, relative 
to country of birth alone. Methods using mother tongue 
alone to define ethnicity also have their limitations. 
Second- or third-generation immigrants in some coun- 
tries (e.g., the United States) may not share the mother 
tongue of their ancestors. Moreover, native individuals 
may report their mother tongue to be a world language 
(originating from a predominantly white country) that 
is accepted as their birth country's official language 
(e.g., French in Congo, English in India). We controlled 
for the latter problem in our data set by recording the 
world languages spoken as official languages specific to 
such countries. 

Despite unanimously high specificity values, the 
sensitivity of our methods to detect Southeast Asian 
immigrants was low relative to self-reported ethnicity. 
Among Southeast Asians there was considerable mis- 
classification into the South Asian category. This result 
may be due to individuals' uncertainty about world geo- 
graphic boundaries for South Asia and Southeast Asia 
(e.g., a South Asian might think that his or her country 



is located in Southeast Asia) or self-identification by 
country of residence rather than country of birth (e.g., 
a person born in India who lived in Malaysia for a long 
time before immigrating to Canada may self-identify as 
Southeast Asian). We also found relatively low positive 
predictive value for the Latin American group, despite 
the high sensitivity and specificity of our methods. 
Some immigrants from certain Latin American coun- 
tries (e.g., Brazil and Argentina) are descendants of 
European immigrants and self-identify as white. More- 
over, a large proportion of the population in Guyana, 
the Latin American country with the largest number 
of immigrants to Ontario, are members of the South 
Asian diaspora. 

Some limitations may exist for the CCHS data that 
we used to validate our classification methods. First, 
the CCHS population linked to our Ontario CIC data set 
may not be a representative sample of Ontario immi- 
grants. Second, the sample size for some visible minority 
groups (e.g., Japanese and Koreans) was limited, which 
can result in less reliable estimates. Third, self-reported 
ethnicity, although often considered a preferred method 
of ethnic group classification, has some shortcomings. 
Self-reported ethnicity may change over time and can 
be influenced by psychosocial factors, such as the feeling 
of pride that a person attaches to his or her ethnic or na- 
tional identity, uncertainty about ethnic origin, or even 
concern related to disclosing one's ethnicity.^'^'^'^^'^"'^^ 
For instance, 5% of the immigrants who self-reported as 
white in the CCHS validation data set were classified as 
West Asian or Arab on the basis of country of birth and 
mother tongue. It is likely that these CCHS participants 
were West Asians or Arabs who reported their ethnicity 
as white on the basis of skin colour. 

In conclusion, in a large data set of Ontario im- 
migrants, we found close agreement between self- 
reported ethnic categories and ethnic categories based 
on country of birth alone or country of birth plus moth- 
er tongue. These findings suggest that the 2 methods 
of ethnic classification described are valid for categor- 
izing most immigrants to Canada into the country's of- 
ficial visible minority groups. Use of a larger validation 
data set in future studies may further illuminate the 
external validity of these methods. 
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